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Objectives: To develop and evaluate time series models to predict the daily number of patients visiting the Emergency 
Department (ED) of a Korean hospital. Methods: Data were collected from the hospital information system database. In 
order to develop a forecasting model, we used, 2 years of data from January 2007 to December 2008 data for the following 
3 consecutive months were processed for validation. To establish a Forecasting Model, calendar and weather variables were 
utilized. Three forecasting models were established: 1) average; 2) univariate seasonal auto-regressive integrated moving av- 
erage (SARIMA); and 3) multivariate SARIMA. To evaluate goodness-of-fit, residual analysis, Akaike information criterion 
and Bayesian information criterion were compared. The forecast accuracy for each model was evaluated via mean absolute 
percentage error (MAPE). Results: The multivariate SARIMA model was the most appropriate for forecasting the daily 
number of patients visiting the ED. Because it's MAPE was 7.4%, this was the smallest among the models, and for this reason 
was selected as the final model. Conclusions: This study applied explanatory variables to a multivariate SARIMA model. The 
multivariate SARIMA model exhibits relativelyhigh reliability and forecasting accuracy. The weather variables play a part in 
predicting daily ED patient volume. 
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I. Introduction 

In recent years, supply has failed to keep up with continu- 
ous increases in the demand for emergency medical services 
[1], which has resulted in problems with overcrowding in 
Emergency Departments (ED) [2-4]. ED overcrowding 
has been shown to affect not only patient satisfaction, but 
also the quality of treatment and prognosis [5-8]. A variety 
of measures have been taken thus far to address this ED 
crowding problem [9-11], including staff supplementation, 
expansions of beds and spaces, diversification of test equip- 
ment, the establishment of walk-in clinics for the treatment 
of light illnesses, hallways, operation of observation units, 
and allocations of staff and resources according to demand 
[12,13]. Several studies concerning demand forecasting for 
allocation of staff and resources have been already been 
conducted [14-16]. In the emergency medical services field, 
Spencer et al. previously attempted to forecast demand in 
the ED using a variety of time series analysis methods [17], 
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and Sun et al. [18]. forecasted the numbers of daily patients 
in individual ED by considering multivariate factors. Lee et 
al. [19] employed multiple linear regressions to analyze the 
factors affecting the number of patients visiting ED in Korea. 
However, thus far, no studies have been conducted regarding 
forecasts of the daily numbers of patients visiting ED. 

The principal objective of this study was to construct a 
model by which the number of patients visiting a regional 
emergency center per day could be predicted, considering 
calendar and weather data using a moving average, uni- 
variate seasonal auto-regressive integrated moving average 
(SARIMA) and multivariate SARIMA; these models were 
compared and evaluated. 

II. Methods 

1 . Data Sources 

This study was conducted according to a retrospective study 
design that utilizes a dataset extracted from a tertiary hos- 
pital information system (HIS) database. The data utilized 
for analysis include 189,511 events involving patients who 
visited the ED from January 2007 to March 2009, excluding 
their identities (names and hospital numbers). The data for 
the first two years (Jan. 2007 to Dec. 2008) were used to con- 
struct the demand forecast model, whereas those from the 
past three months (Jan. to Mar. 2009) were used to evaluate 
the model. Weather data were acquired from the weather 
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agency's website [20] . 

2. Ethical Consideration 

This study employed the summed numbers of patients visit- 
ing the ED per day, and utilized no identifications or pa- 
tients' personal information. Therefore, this study was not 
subject to review by the Institutional Review Board. 

3. Data Preprocessing and Variable Selection 

The daily number of patients visiting the ED was employed 
as a dependent variable, whereas information including 
the month, day of the week, quarter of the year, holidays, 
Chuseok [Y/N], seasons, weather factors [20] (average tem- 
perature, minimal temperature, maximal temperature, tem- 
perature gap, rain [Y/N], snow [Y/N], air- velocity, relative 
humidity, and yellow dust [Y/N] ) were employed as inde- 
pendent variables (Table 1). 

The number of patients visiting the ED per day was cal- 
culated by counting the number of visiting patients from 
midnight to the following midnight. Holidays include pub- 
lic holidays and Sundays. Weekdays immediately follow- 
ing a holiday and Saturdays are defined as "After Holiday". 
Chuseok, which differs significantly from other holidays, 
is classed as a separate variable. Quantities of precipitation 
exceeding 10 mm, which ordinary people tend to regard as 
a significant amount of rain, was the threshold for raining". 
The operational definition for "snowing" in this study was as 



Table 1. Definition of variables 



Variables 


Explanation 


Month 


January, February, March, April, May, June, July, August, September, October, November, De- 
cember 


Day of the week 


Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday 


Quarters of the year 


1Q, 2Q, 3Q, 4Q 


Holiday 


Weekdays = 0, Holiday = 1, After holiday = 2 


Chuseok (Y/N) 


Chuseok (Yes = l,No = 0) 


Seasons 


Spring , Summer, Fall, Winter 


Average temperature 


Average temperature 


Minimal temperature 


Minimal temperature 


Maximal temperature 


Maximal temperature 


Temperature gap 


Maximum-minimum temperature 


Rain (Y/N) 


Rain (Yes [> 10 mm] = 1, No = 0) 


Snow (Y/N) 


Snow (Yes = 1, No = 0) 


Air-velocity 


The speed of the wind 


Relative humidity 


Relative humidity 


Yellow dust (Y/N) 


Sandy dust phenomena (Yes = 1, No = 0) 
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abundance of piled-up snow, which again exceeds the gen- 
eral notion of "snowing". 

4. Modeling Technique 

For this study, the moving average method, the univari- 
ate SARIMA model, and the multivariate SARIMA model 
were used, among other models relevant to the time series 
analysis method. The moving average method [21], which is 
known to be the simplest of the forecasting methods, utilizes 
past time series data (yearly, monthly, quarterly) to calculate 
the arithmetic mean. Its principal advantage is in its capacity 
to remove irregular changes or seasonal factors with relative 
ease. The seasonal ARIMA model [18,19] is an expanded 
form of the ARIMA, which allows for seasonal factors to be 
reflected. Unlike the moving average, the trend of time series 
and the seasonal trend are removed to achieve normality pri- 
or to forecasting. The SARIMA model consists of the follow- 
ing: 1) auto-regression; 2) difference; and 3) moving average, 
and is represented as SARIMA(p, d, q)(P, D, Q), in which {p, d, 
q) represents the non-seasonal part and (P, D, Q) represents 
the seasonal part. S represents the length of the seasonality. 
The p, d, q or P, D, Q represents the auto-regression, differ- 
ence, and moving average, respectively. The SARIMA model 
is known to be effective when the components of a time 
series change rapidly over time, and this model has proven 
useful in the forecasting of short-term volatility. Unlike the 
univariate SARIMA model, the multivariate SARIMA model 
[18] adds an explanatory variable to the SARIMA model, 
which illustrates the manner in which an alteration in the 
variable can influence the dependent variables. In this study, 
the number of patients visiting the ED daily was employed 
as a dependent variable, and the calendrical and meteoro- 
logical information are utilized as independent variables for 
the construction of the forecasting model. The SPSS Time 
Series Modeler (SPSS ver. 15.0; SPSS Inc., Chicago, IL, USA) 
is used in the construction of the forecasting model and the 
comparisons. 

5. Model Evaluation 

In order to compare the adequacy and performance of the 
constructed models, residual analysis [22] was conducted 
and the Akaike Information Criterion (AIC) [23], Bayesian 
Information Criterion (BIC) [24], and Mean Absolute Per- 
centage Error (MAPE) [17] are calculated. Residual analysis 
is employed in the time series model to determine whether 
or not white noise exists in the residuals, which are the dif- 
ferences between the predicted and observed values. If the 
residuals move randomly centering on 0 (the average of the 
residuals) in the time series diagram and the autocorrela- 
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tion function diagram of the residuals and the deviation is 
constant, while the autocorrelation function falls within the 
confidence interval for all time differences, then the residu- 
als are statistically independent; thus, we can be assured of 
the fitness of the model. 

As the SARIMA model does not distinguish clearly be- 
tween the partial auto -correlation function (PACF) and the 
auto-correlation function (ACF), it compares the AIC and 
BIC values from forecasting models and selects the one with 
the smallest value as the final forecasting model. MAPE rep- 



Table 2. The results of comparison between training data set and 
validation data set by % 2 test 







Training 


Validation 




Characters 


data set 
(o/o) 


data set p-value 
(°/o) 


Sex 


Male 


47.90 


47.10 0.876 




Female 


52.10 


52.90 


Aee 

o 


Geriatric 


34.00 


31.90 0.627 


group 


(age > 65) 








Adult 


37.00 


42.00 




(16 < age < 65) 








Pediatric 


28.90 


26.10 




(age < 16) 






ER care 


Discharge 


25.60 


26.90 0.237 


result 










Others 


3.60 


0 




Death 


9.00 


5.90 




Admission 


25.60 


31.10 




Transfer 


15.70 


17.60 




Cancel 


20.50 


18.50 


Emergency 


1st zone 


15.10 


18.50 0.071 


zone 


(New patient area) 








2nd zone 


14.20 


14.30 




(Med. -adult) 








2nd zone 


13.30 


18.50 




(Surgery-adult) 








3rd zone 


12.30 


14.30 




(Child area) 








9th zone 


16.60 


18.50 




(CPR, ICU) 








Delivery 


14.50 


5.90 




Waiting room 


9.90 


10.10 




for admission 







ER: emergency room, CPR: cardiopulmonary resuscitation, 
ICU: intensive care unit. 



160 www.e-hir.org 



doi: 10.4258/hir.2010.1 6.3.1 58 



J-J JR Healthcare Informatics Research 



Prediction of Daily ED Patient Numbers 



resents the relative scale of the forecasting error between the 
forecasted value, which is a series variable, and the observed 
value; the smaller the error, the more accurate the forecast is. 

III. Results 

Data collected from the 2007-2008 period was employed 
in the development of the model used to forecast the daily 
numbers of patients visiting the ED. The total number of 
patients who visitied the ED during that period was 169,375, 
with an annual average of 84,668 and a daily average of 232. 
Chi- square tests were in order to ascertain whether any 
significant differences could be detected between the two 
datasets, and no significant differences were detected at a 
confidence interval of 95% (Table 2). 

The time series diagram shows that the number of patients 
per day begins to increase on Saturday and peaks on Sunday, 
and then begins to decrease on Monday and stays low until 
Friday, thus describing a 7-day cycle (Figure 1). 

That diagram also demonstrates that the number of visiting 
patients over time trends upward. The 1st seasonal difference 
was applied to remove the seasonal trend. As a consequence, 
the time series diagram suggests a stationary time series, in 
which the mean and deviation could not be observed clearly 
(Figure 2). 

Three models — the MA(2) for the moving average model, 
S ARIMA( 1,0,1) (0,1, 1) 7 for the univariate SARIMA model 
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Figure 1. Time plots of daily emergency department (ED) patients 
(2007. 01-2009. 03). During the period from January 
2007 to March 2009, a total of 189,511 ED patients 
visited and average number of daily patients was 231. 
The sequencing graph showed a 7-day periodicity and 
seasonal trend. In particular, there was a sharp in- 
crease in the number of patients in Chuseok. 



and SARIMA(1,0,2)(0,1,1) 7 for the multivariate SARIMA 
model- -were constructed using the SPSS Time Series Mod- 
eler. Parametric estimations using the maximal likelihood 
method showed that only the variables Chuseok, seasons 
(spring, summer, fall, and winter), average temperature, and 
rain could be selected as explanatory variables for the multi- 
variate SARIMA model (Table 3). 

Residual analysis for the purpose of determining the ad- 
equacy of the constructed models shows that the univariate 
SARIMA model and the multivariate SARIMA model, re- 
spectively, have an average of residuals that moves randomly 
but us centered on 0, and that the constant deviations and 
autocorrelation functions fall within the confidence interval, 
thereby indicating that the residuals are independent and 
fulfill the 'white noise' criterion. 

The diagram that compares the forecasted and observed 
values over three months in the three forecasting models 
shows that the MA(2)'s forecast virtually displays the mean 
of the observed value, thus reflecting its inadequacy as a pre- 
diction model (Figure 3). 

On the contrary, the SARIMA model exhibits a pattern of 
change similar to the observed values. Superficially, it is dif- 
ficult to distinguish between the univariate and multivariate 
SARIMA models. The AIC and BIC values, which compare 
the adequacy of the models, are better in the ARIMA model 
than in association with the moving average method, with 
the SARIMA(1,0,2)(0,1,1) 7 model evidencing more adequate 
results than the univariate SARIMA model (Table 4). A 
MAPE comparison of the accuracy of each model's forecast- 
ing ability demonstrates that the MA(2) models scored a 
12.9%, the univariate SARIMA(1,0,1)(0,1,1) 7 model scored 
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Figure. 2. The time series after transforms using seasonal differ- 
ence [1]. 
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Table 3. Model parameters 



Model Variables 




Lag 


Estimate 


SE 


t 


Sig. 


[A] Daily ED patients 


Constant 






231.8 


2.774 


83.571 


0.000 




MA 


Lag 


;i 


-0.600 


0.035 


-16.997 


0.000 






Lac 


;2 


-0.305 


0.035 


-8.638 


0.000 


[B] Daily ED patients 


Constant 






0.677 


0.304 


2.224 


0.026 




AR 


Lac 


;1 


0.759 


0.041 


18.382 


0.000 




MA 


Lag 


;1 


0.279 


0.059 


4.690 


0.000 




Seasonal difference 






1 










MA, seasonal 


Lag 


;1 


0.9066 


0.018 


50.348 


0.000 


[C] Daily ED patients 


Constant 






0.720 


0.232 


3.104 


0.002 




AR 


Lag 


;i 


0.485 


0.036 


13.593 


0.000 




MA 


Lag 


;2 


-0.120 


0.042 


-2.8503 


0.004 




Seasonal difference 






1 










MA, seasonal 


Lag 


;1 


0.898 


0.018 


48.773 


0.000 


Chuseok (Y/N) 


Delay 






1 










Numerator 


Lag 


;0 


58.599 


16.363 


3.5812 


0.000 




Seasonal difference 






1 








Seasons 


Numerator 


Lag 


;0 


-12.00 


4.674 


-2.568 


0.010 






Lag 


;i 


-13.67 


4.665 


-2.929 


0.004 




Seasonal difference 






1 








Average temperature 


Numerator 


Lag 


;o 


0.995 


0.266 


3.746 


0.000 




Seasonal difference 






1 








Rain (Y/N) 


Delay 






4 










Numerator 


Lag 


;0 


9.125 


3.187 


2.863 


0.004 






Lag 


;2 


-10.33 


3.200 


-3.227 


0.001 




Seasonal difference 






1 









[A]: MA(2), [B]: univariate SARIMA(1,0,1)(0,1,1) 7 , [C]: multivariate SARIMA (1,0,2)(0,1,1) 7 , ED: emergency department, SARI- 
MA: seasonal auto-regressive integrated moving average. 



7.8%, and the multivariate SARIMA ( 1,0,2) (0,1, 1) 7 model 
scored 7.4%, thus identifying the final model as the most 
accurate forecasting model (Table 4): the normalized BIC 
values for the training and test data were also presented for 
comparisons. 

IV. Discussion 

In this study, three models are developed to forecast the 
number of patients visiting an ED per day: [A] Moving aver- 
age model: MA(2), [B] univariate seasonal ARIMA model: 
SARIMA(1,0,1)(0,1,1) 7 , and [C] multivariate seasonal 
ARIMA model: SARIMA(1,0,2)(0,1,1) 7 . A comparison of the 
goodness of fit of the three forecasting models shows that 



only the final two models have residuals that fall within the 
confidence interval. 

A comparison of the models' forecasting accuracy shows 
the multivariate SARIMA model (SARIMA(1,0,2)(0,1,1) 7 ) to be 
the most accurate, with a MAPE of 7.4%. The two SARIMA 
models have a MAPE of less than 10%, thereby suggesting 
a high degree of accuracy. As the SARIMA models exhibit 
autocorrelation and the capacity to account for seasonality, 
they also tend to evidence accuracy higher than the moving 
average. It appears that the multivariate seasonal ARIMA 
model can forecast the number of visiting patients more ac- 
curately than the univariate model, as it incorporates explan- 
atory variables that affect that number (Chuseok, seasons, 
average temperature, and absence or presence of rain). 



1 62 www.e-hir.org 



doi: l0.4258/hir.2010.1 6.3.1 58 



J-J YR Healthcare Informatics Research 



Prediction of Daily ED Patient Numbers 



Batel et al. [25] previously demonstrated that the number of 
visiting patients peaks on Monday and continues to decrease 
until Sunday, whereas Lee et al. [19]. demonstrated that the 
number of patients per day begins to increase on Saturday 
and peaks on Sunday and then begins to decline on Monday 
and remains low until Friday exhibiting a 7-day cycle and 
a seasonal trend. This may be attributable to differences in 
the medical environments. Batel et al. [25]. employed walk- 
in-clinics that operate for 15.5 hours over the entire week. 
In this study the number of patients visiting the ED rises on 
Sundays and public holidays such as Seollal and Chuseok. 
This is because outpatient treatments are all closed on holi- 
days, which means that the ED does double duty. 

Lee et al. [19] identified a weak correlation among the ma- 
ximal, minimal, and average temperature of the day and the 
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Moving average model: MA(2) 



Univariate seasonal ARIMA model: SARIMA(1 0, 1)(0, 1, 1) 7 . 



Multivariate seasonal ARIMA model: SARIMA(1 , 0, 2)(0, 1 , 1 ) 7 



Observed 
Predicted 



X? # <P & v# # 4> & \& <P 

Date (09/1/1-09/3/31) 
Figure 3. Observed and predicted daily emergency department 
patients. 



number of visiting patients, and reported no differences in 
the number on rainy days and non-rainy days. The same 
is true for snowy days and non-snowy days. Spencer et al. 
[17] also argued that weather has only a minimal effect on 
number of patients who visited an ED. This study, however, 
shows that the multivariate SARIMA model that incorpo- 
rates multiple weather factors generates the optimal results, 
thus suggesting that weather, and rain in particular, affects 
the numbers of visiting patients. Considering that patients 
in Korea consist of the following three types -1) those in an 
emergency situation, who require immediate treatment, 2) 
non-emergency patients who seek treatment on holidays or 
at night, and 3) non-emergency patients who seek outpatient 
treatment without a doctor's reference- it can be inferred that 
the non-emergency patients are impacted most profoundly 
in this situation. In fact, Hwang et al. [14] has reports that 
2,276 out of 4,273 new patients (53.3%) obtain treatment at 
an ED. As a consequence, considering the number of non- 
emergency patients who visit the ED, it is necessary to take 
into account weather factors when constructing demand 
forecasting models for emergency medical centers. 

It will also be necessary to develop a variety of demand 
forecasting models that reflect local environments, since the 
studies conducted thus far have failed to take into consider- 
ation local medical systems, social and cultural backgrounds, 
and the relevant geographical factors. 

In conclusion, as the result of our comparison of the three 
constructed forecast models, it was determined that the multi- 
variate SARIMA model that incorporates explanatory variables 
was the most appropriate for forecasting the daily number 
of patients visiting the ED; this model appears to reliably 
and accurately forecast the number of patients admitted to 
the ED per day. The results of this study demonstrated that 
weather information, particularly temperature and rain (or 
the absence there of), should be considered when attempt- 
ing to predict the daily volume of ED patients. The proposed 



Table 4. Goodness of fits for models (AIC, BIC and normalized BIC) and MAPE values of constructed models 



Model 




Training 






Prediction 




AIC 


BIC 


Normalized BIC 


MAPE 


Normalized BIC 


MAPE 


[A] 


7,448.4 


7,462.2 


7.375 


12.909 


7.160 


11.209 


[B] 


6,815.7 


6,834.0 


6.631 


7.788 


6.802 


8.484 


[cr 


6,703.7 


6,749.5 


6.568 


7.372 


6.991 


7.437 



AIC: Akaike information criterion, BIC: Bayesian information criterion, MAPE: mean absolute percentage error, [A]: MA(2), [B]: 
univariate SARIMA(1,0,1)(0,1,1) 7 , [C]: multivariate SARIMA (1,0,2)(0,1,1) 7 . 
a Multivariate SARIMA Model was best in performance measurements. 
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prediction model can be used to forecast the daily patient 
volume in ED, and preparing for the previously mentioned 
ED crowding problems regarding the allocations of staffs 
and resources. More detailed prediction models related 
to medical demands in ED on resources and easement of 
overcrowding can be achieved from the proposed model via 
more departmentalized and processed variables that affect 
staff supplementation, space shortages, and the diversifica- 
tions of test equipments. 
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